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fvj Abstract. We extend the recent sparse Fourier algorithm of |LWC13j to the noisy set- 
ting. We present two such extensions, the second of which exhibits a novel form of error- 

f^ , correction not unlike that of the /3-encoders in analog-to-digital conversion |DDGV06| . 

^^rfH The algorithm runs in time 0{klog{k)log{N/k)) on average, provided the noise is not 

^ overwhelming. The error-correction property allows the algorithm to outperform FFTW 

^sO over a wide range of sparsity and noise values, and is to the best of our knowledge novel 

^~~^ in the sparse Fourier transform context. 



"j^ 1. Introduction 

S , , 

'— ' The Fast Fourier Transform (FFT) |CT65j is a fundamental numerical algorithm whose 

^~~^ importance in a wide variety of applications cannot be overstated. The FFT reduces the 

f^">- runtime complexity of calculating the discrete Fourier transform (DFT) of a length N array 

ly-s from the naive 0(A^^) to 0(A'^log(A^)). At the time of its introduction in the mid-1960s, it 

. dramatically increased the size of problems that a typical computer could handle. Over the 

(^ past fifty years the typical size of data sets has grown by orders of magnitude, and in certain 

I application areas (e.g. ultra-wideband radar) the computation of the full FFT is no longer 

J> tractable on commodity hardware. In this and other instances, however, it is known a priori 

S^ that the signals of interest have small frequency support; that is, their Fourier transforms 

C^ are sparse. This problem has received attention from a number of research communities 

over the past decade, who have shown that it is possible to significantly outperform the 

FFT in both runtime and sampling requirements when the number of significant Fourier 

modes k is much less than the nominal bandwidth A^. 

The earliest work to specifically address the sparse Fourier transform problem was |GGI^02| , 
which gave a randomized algorithm with runtime and sampling complexity 0(A;^ polylog(A^)). 
This was later improved to 0(A;polylog(A^)) [GMS05J through the use of unequally-spaced 
FFTs |AD96j . For a given failure probability 6 and accuracy parameter e, the algorithm 
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returns a k-teim approximation y to the DFT of the input x that is near-optimal: with 
probabilty 1 — 6 it holds that 

(1.1) ||x-y||l<(l + e)||x-Xfe||i. 

Here x^ is the best /c-term approximation to x. A separate group of authors |HIKP12bj has 
developed a modified version of this algorithm with runtime 0(log(A^)Y'iVfclog(]V)). While 
the dependence on N is sub-optimal asymptotically, in practice this algorithm is significantly 
faster than either |GGI"'"02j or |GMS05J . The same authors presented an improved algorithm 
with runtime 0(A:log(iV) log(iV/fe)) in |HIKP12a] whose frequency identification prodecure 
is very similar to [LWC13J , upon which the present work is based. However, the performance 
of |HIKP12a] in the presence of noise has yet to be evaluated empirically. 

The algorithms described in the previous paragraph are all randomized, and so will fail 
on some small subset of potential inputs. Recognizing this as a potential detriment in 
failure- intolerant applications, two authors have independently given deterministic algo- 
rithms for the sparse Fourier transform problem. In |AkalO| an algorithm with runtime 
poly(fc, log(A^)) was given where the exponent on k is at least six. This high dependence on 
k renders the algorithm infeasible in practice, and it has not been implemented. In [IwelO] , 
the combinatorial properties of aliasing among frequencies were exploited to give an algo- 
rithm with runtime and sampling complexity 0(A;^polylog(A'^)). While this represented a 
major improvement over the theoretical runtime complexity of |AkalO] . in practice it only 
outperformed the FFT for relatively modest values of the sparsity k. 

Most recently the authors of |LWC13J gave a deterministic algorithm with average-case 
sampling and runtime complexity 0{klog{N)). The worst-case runtime bounds are asymp- 
totically of the same order as |IwelOj , but over a representative class of random signals it 
was shown to significantly outperform its deterministic and randomized competitors. This 
was achieved by sampling the input at two sets of equispaced points slightly offset in time. 
This time shift appears in the Fourier domain as a frequency modulation, which allows 
the authors to both detect when aliasing has occurred and, for frequencies that are iso- 
lated (i.e. not aliased), to calculate the frequency value directly. While [IwelOj also uses 
properties of aliasing to reconstruct frequency values, it is not able to distinguish between 
aliased and non-aliased terms until sufficiently many DFTs of coprime lengths have been 
computed, and so is unable to perform any better in the average case than in the worst case. 



A MULTISCALE SUB-LINEAR TIME FOURIER ALGORITHM FOR NOISY DATA 3 

In the empirical evaluation of |LWC13J an improvement of over two orders of magnitude 
was observed over |GMS05j and [IwelOj . 

In this paper we extend the algorithm of |LWC13J to noisy environments in two distinct 
ways. The main result of this paper is a novel multiscale error-correcting algorithm that 
utilizes offset time samples at geometrically spaced shifts. This extension is in essence 
a progressive frequency identification algorithm not unlike the /S-encoders for analog-to- 
digital conversion [DDGVOG] . The new algorithm gives excellent performance in the noisy 
setting without significantly increasing the computational costs from the noiseless case. 
For comparison purposes we also present another extension, which is a minor modification 
of the noiseless algorithm based on a certain rounding of the frequency estimates. For 
both extensions we provide detailed mathematical analysis as well as empirical evaluations. 
While both extensions work well in the noisy environment, the multiscale algorithm achieves 
comparable or superior accuracy at a significantly lower computational cost. 

The remainder of this paper is organized as follows. In Section [2] we review the notation 
introduced in [LWC13) that will be necessary for the sequel. We also describe our noise 
model, discuss some of the problems noisy signals present for the algorithm of JLWCISJ . 
and argue that in certain applications the f.2 error metric is inappropriate and should be 
replaced with a form of Earth Mover's Distance. We also describe the random signal 
model used in the empirical evaluations in Sections [3] and |4j In Section [3] we give our first 
modified algorithm and analyze the dependence of the sampling rate on the noise level. We 
also illustrate its performance with an empirical evaluation. In Section [4] we describe our 
multiscale frequency identification procedure and provide an empirical evaluation. Finally 
in Section [5] we provide a brief conclusion. 



2. Preliminaries 

2.1. Notation and brief review. In this section we introduce the notation that will be 
used in the remainder of this paper and briefly review the results in |LWC13J . We consider 
frequency-sparse band- limited signals 5" : [0, 1) — )• C of the form 



(2.1) S{t) = Y, au^e 

wen 



27riajt 



4 ANDREW CHRISTLIEB, DAVID LAWLOR, AND YANG WANG 

where $7 is a finite set of integers bounded in [— A^/2, N/2) and 7^ Ooj G C for each cj G 0. 
For simplicity we shah extend S{t) periodically to a function on the whole real line. The 
Fourier samples of S are given by 

(2.2) S{h) = I S{t)e-'^^'^h, h£Z, 



so that for signals of the form (2.1) we have S{u;) = a^^j for w G il and S{h) = for all 

other h G 7j. In practice we work with data of finite length. Given any finite sequence 

s = (sq, Si, ... , Sp-i) of length p its discrete Fourier transform (DFT) is given by 

p— 1 p— 1 

(2.3) s[h] = ^.,.e-2-iVP = J^sbWf , 

j=0 j=0 

where h = 0,1, . . . ,p — 1, s[j] := Sj and Wp := e^'^'^^'P is the primitive p-th root of unity. 
The Fast Fourier Transform (FFT) |CT65) allows the computation of s in 0{plogp) steps. 

All fast reconstruction algorithms apply the DFT to selected finite sample sets of S{t), 
and our work is no exception. Let p be a positive integer and e > 0. The two sample sets 
we use extensively are Sp and Sp,£, which are length p samples of S{t) given by 

Sp[j] = 5(-), Sp,e[j] = s{-+e), j = 0,l,...,p-l. 
For each h let Ap^h = {00 G 0, : lo = h (mod p)}. It is a simple derivation to obtain 
K[h]=P Y. «-' ^pAh]=P Y. «-e2-'"- 

In the ideal scenario where all {uj (mod p) : a; G 17} are distinct we have 

pauj h = Lo (mod p) for some uj gQ, 



Sp[h] 



otherwise, 



and similarly 



?; rr 1 f pauiG^^^^'^ h = LO (mod p) for some a; G i7. 



Thus, the nonzero elements of Sp[h] occur precisely at the locations h = uj (mod p) for some 
a; G 0, and moreover for such h we have |Sp[/i]| = |Sp,£[/i]|. Furthermore for each ui £ 0, 
and h = LO (mod p) we have -S,, = e^'^^^'^. Hence 

(2.4) Ineuj = Arg f ^P^ ] (mod 2^), 

V Sp[/i] / 
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where Arg(z) denotes the phase angle of the complex number z in [— 7r,7r). Now assume 



that we have |e| < -^. Then cj is completely determined by (2.4), as there will be no 
wrap-around aliasing. Hence 

(2.5) w = 




The weight a^ can be recovered via a^ = Sp[h]/p. 

Remark. In fact, more generally, if we have an estimate oi lo £ Q, say \uj\ < ij, then by 



taking e < j; the same reconstruction formula (2.5) holds. We will use this observation in 
Section |4] when we develop a multiscale frequency identification procedure for noisy signals. 

Of course it is possible that not all {uj (mod p) : u €z Q} are distinct. For an a; G fi we 
say CO has a collision modulo p, or simply has a collision when there is no ambiguity, if there 
is at least one other co' £ 0, such that uj = oj' (mod p). In |LWC13J a criterion is developed 
to detect collisions in the noiseless case. For cj G fi and h = to (mod p), it is clear that a 
necessary condition for no collision to occur is 

Sp[h] 
It is shown in |LWC13j that for a randomly chosen e > the converse holds with probability 



(2.6) 



I 27neaj| -i 



one, and furthermore checking the condition (2.6) for several e would be sufficient to deter- 
ministically decide whether u has a collision. In section |4] we use this latter observation to 
devise a robust test for collisions even in the presence of noise. 

The algorithm developed in |LWC13J for recovering S{t) is as follows: First we pick a 
prime p = pi, which is roughly 5k where k = \^\ is the number of modes in S{t) {k is 
commonly referred to as the sparsity of S{t)). By taking p to be about 5 times the sparsity 
we ensure that on average collisions do not occur for more than 90% oi uj £ Q. Let Q' 
denote the subset of fi consisting of all non-collision u £ il.. For each oj £ i^' we recover 
OojC^'^"^*, and update S{t) to 

5i(t) = 5(t)- J^a^e^™*. 
Luen' 

We now apply the above procedure again for Si (t) with a different prime p = p2 approxi- 
mately in the range of 5ki, where ki = k — \i^'\ is now the sparsity for S'i(t). This process 
is repeated until all modes are found. 
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In the implementation of the algorithm we set a small threshold in (2.6) to check for 



collisions. This means there is a small probability that a collision is undetected by our 
criterion and a false value loq is put into W when it shouldn't be. However no irreversible 
harm is done here because all this does is to create a new mode — cge^'^''^"* for some cq G C 
in Si{t). By the use of different primes pj in each iteration this false mode will eventually 
be identified and subtracted from the final reconstruction. 



2.2. Noise model. In practice the samples we collect will be corrupted by some noise. In 
this paper we assume an i.i.d. noise model 



(2.7) 



%[J] = S{t +nj = S,[j]+n„ 



where n = (uj) are i.i.d. complex random variables with mean and variance cr^. A typical 
model is to assume {uj} are i.i.d. complex Gaussian. In real world applications it is perhaps 
more realistic to set a bound on tlj, say |nj| < r for all j for some r > 0. In our numerical 
experiments we use Gaussian noise with a cut-off r = 2a. With the noise model we have 



p-i 



(2.^ 



S;[h] = Sp[h]+n[h] 



By the i.i.d. property for {uj} 

E[n[h]] =0 
This yields 



where n[h] = ^ Hje-^'^'^^/P. 

j=0 



and Var[n[/i]] = pa 



(2.9) 



E 



S;j[/i] =Sp[h] and 



E 



\s;[h]-Sp[h]\' 



■ pa 



Thus, a typical noisy DFT coefficient Sp[h] will deviate from the true value Sp[h] by an 
amount proportional to a^/p. Similarly, for S° = Sp^s + % we will have 



(2.10) 



E 



E 



I Sp,e ["-J ^P,£ 



pa 



Sp,e [h] = Sp,£ [h] and 

We now pick a non-collision w € il. Then for h = uj (mod p) we will have 

Sp[h] =pa^ + 0(^cr), 
S°^[/i] =pa^e'^ 
As a result a^^ can now be estimated easily via 



^2--- + 0(Vpct). 



(2.11) 



1 
p 



Sl\h] + 



\^)- 
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The real challenge lies in the recovery of the frequencies in Q. Assume that |Sp^e| has 
a pulse at h. Then h = oj (mod p) for some w G il. If there is no collision for w, in the 



noiseless environment oj is recovered via (2.5) as long as e < 1/A^. In the noisy setting 
Sp,£[/i]/Sp[/i] must be replaced by Sp£[/i]/Sp[/i]. Interestingly, the mean of Sp£[/i]/Sp[/i] is 
in general not Sp^e[/i]/Sp[/i] as a result of the division. Nevertheless we have 

Sp[/i]e2^"^^+£4/i] 



Sp,e[ 



S°[/i] 



Sp[/i]+S[ 



Spl/ile 



2it\uje 



0((tVp) 



(2.12) 



^2-..^e ^ Q (g/g^^) 

1 + O (a/a^VP) 
+ O {(j/a^^) . 



^2-Kiijje 



Thus the ratio of noisy DFT coefficients agrees with the noiseless ratio up to an error term 
on the order of a/\a^^\^. 

Given this estimate for the ratio of noisy DFT coefficients, we can derive bounds for the 
error in the Lee norm for the phase angle computed via Arg(z). Let £ be a lattice in M. 
For any ^ G M the Lee norm associated with the lattice C for 6 is given by the distance of 9 
to the lattice C, i.e. ||^||£ := min^g/; \9 — k\. Under the Lee norm associated with the lattice 
27rZ it is well known that 



(2.13) 



Arg(2; + r/) - Arg(z)||27rZ = II Arg(l + z "^r/) ||27rZ < k Si- 



Thus for a non-collision w S fi and h = uj (mod p), the estimates (2.13) and (2.12) combined 
yield 



(2.14) 



Arg 



Ittoje 



< O 



2ttI 



a 



^vp 



When we apply the estimate (2.5) for uj under the noise model we will end up with an 
approximation 

1 



OJ ' 



2tt£ 



Arg 



such that 
(2.15) 



\0J 



^Wnz < O 






a 

27re\auj\^ 
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Now if we apply the algorithm developed in |LWC13] the ratio o je^ is critical in deter- 
mining the sensitivity of our phase estimation (as well as the weight estimation) to noise. 
Without any modifications to the algorithm it is thus important that we choose the lengths 
•p so that a/ey/p is within the tolerance. 

2.3. Earth mover distance. In the existing literature on the sparse Fourier transform, the 
H.2 norm is most often used to assess the quality of approximation. There are many reasons 
for this choice, with the two most convincing perhaps being the completeness of the complex 
exponentials with respect to the £2 norm and Parseval's theorem. For certain applications, 
however, this choice of norm is inappropriate. For example, in wide-band spectral estimation 
and radar applications, one is interested in identifying a set of frequency intervals containing 
active Fourier modes. In this case, an estimate u) of the true frequency uj with |S; — c;j| <^ N 
is useful, but unless u) = uj the £2 metric will report an 0(1) error. For these reasons, we 
propose measuring the approximation error of sparse Fourier transform problems with the 
Earth Mover Distance (EMD) |RTG00j . Originally developed in the context of content- 
based image retrieval, EMD measures the minimum cost that must be paid (with a user- 
specified cost function) to transform one distribution of points into another. EMD can be 
calculated efficiently as the solution of a linear program corresponding to a certain flow 
minimization problem. 

For our problem, we consider the cost to move a set of estimated Fourier modes and 
coefficients |(cjj,c^.)} ._ to the true values { ((^j , C(^^. ) } ._ under the cost function 

di{{i^,c^),{u},ci:,)]N) := — — h \cuj -c^\. 

This choice of cost function strikes a balance between the fidelity of the frequency estimate 
(as a fraction of the bandwidth) and that of the coefficient estimate. We also consider the 
"phase-only" cost function 

d^{ui,uj2;N) := — , 

which provides a measure of how close our frequency estimates are to the true values. We 
denote the EMD using di by EMD(l) and using di^ by EMD(a;) in our empirical studies in 
Subsections 13.21 and 14.41 below. 



2.4. Random signal model. For the empirical evaluations in Subsections 3.2 and 4.4 



we consider test signals with uniformly random phase over the bandwidth and coefficients 
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chosen uniformly from the complex unit circle. In other words, given k and N, we choose 

k frequencies coj uniformly at random (without replacement) from [—N/2, N/2) 7L. The 

corresponding Fourier coefficients Oj are of the form ^ i , where Qj is drawn uniformly 

from [0, 1). The signal is then given by 

fc 
(2.16) S(t) = ^aje2'^^'^^*. 

This is the standard signal model considered in previous empirical evaluations of sub-linear 
Fourier algorithms [K^SOTl IH^^Tnl IHIKP12b[[LWCT3] . 

3. Rounding: A Minor Modification of Noiseless Algorithm 

A simple modification to the noiseless algorithm of [LWC13J for the noisy case is to 
increase the sample lengths p. By choosing -p large enough the error from noise can be mit- 
igated to be within a given tolerance. The modification can be viewed simply as rounding, 
and we include it both as a more direct and simple to implement extension as well as for 
comparison purposes. When the noise level is low, this modification yields reasonably good 
results. 

As in the noiseless case we choose the shift e > so that e < 1/N. In the noiseless case 
e = 1/A^ would be sufficient to avoid wrap-around aliasing in the phase reconstruction. Due 



to the presence of noise we will need to make e slightly smaller because of (2.15). Let us 
analyze the recovery process for an a; G i7 if we simply carry out the same process as in the 
noiseless environment. 

First we choose a length p. Assume that uj £ Q does not collide with any other uj' G 0, 
modulo p. Let h = uj (mod p). The reconstruction of uj utilizes two factors. First, the 
location of peaks in the DFT are robust to noise: even with a relatively high noise level we 



may take h = uj (mod p) to be exact. Second, by (2.15) the frequency reconstruction from 
noisy measurements is correct up to an error term of size O ((7/27re|a(^|-y/p). By combining 
these two measures we can more reliably estimate uj. 

Our proposed modification is to simply round the noisy frequency estimate 

UJ = - — Arg ' ' 



2vre \ S 



n 
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to the nearest integer of the form np + h. This improved estimate is therefore given by 

(3.17) uj' = p ■ round I ) + h, 

\ P J 

where round(2;) returns the nearest integer to x. For low noise levels this modification will 
return the true value w, while for larger noise levels it is possible that w deviates by more 
than p/2 from the true frequency u). In this case the estimate oj' will be wrong by a multiple 
of p. Larger values of p will reduce the likelihood of an error in frequency estimation. 

To ensure that the estimated frequencies are sufficiently far from the branch cut of Arg(z) 
along the negative real axis, we take the shift e < 1/2N. The estimated frequencies then 
satisfy — A^ < u < N, while the true frequencies lie in the smaller interval [—N/2, N/2). 
It is thus extremely unlikely that the deviations due to the noise will push the estimates 
across the discontinuity. 

We saw in the previous section that the error in the phase estimation is on the order of 
^p-i/2 Tf^Y^Qj^ using the reconstruction formula (2.5). When using the rounding procedure 



(3.17), however, we should expect accurate results for a wider range of sample lengths p 
and noise levels a. Indeed, note that the rounded frequency estimate uJ' is exact as long as 

(3.18) |S-a;|<|. 



Recall from Section 2.2 that the error of the frequency estimate uJ is on the order of 
0{a/e^). Let us assume that it is bounded by Co/e^ for some constant C. Com- 
bining this with the requirement (3.18) we see that the rounded frequency estimate Zj' will 
be exact provided 

It follows that we get exact reconstruction if p > {2Ca/e)'^'^. 

To illustrate this relationship, we generated 1000 test signals with frequencies chosen 
uniformly at random from [—N/2, N/2) and set the corresponding coefficient to unity. Thus 
our test signals for this empirical trial were one-term trigonometric polynomials. For this 
test we took A^ = 2^^ and investigated a range of parameters (u, p) . We reconstructed the 
frequencies in two ways: first, simply using the formula ( |2.5[ ), and second by combining this 
estimate with the rounding procedure (3.17). In Figure IT] we plot the average phase error 
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mean phase error 
without rounding 
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Figure 1. (left) Mean phase error (in log scale) for frequency estimation 



via (2.5). (right) Mean phase error (in log scale) for frequency estimation 
with rounding via ( 3.17[ ). The red dotted line marks the transition to exact 
recovery when p > (2o"/e)^'^. 



in logarithmic scale as a function of both a and p, which were varied from 2.5 x 10 ^ to 
0.4906 and from 10 to 163840, respectively, by powers of two. 

In the plot on the left, which corresponds to reconstruction using only ( |2.5[ ), we can clearly 
see the contours of constant phase error obeying the relationship log2(p) = 21og2(cj) + a 



for various a. This confirms our analytic estimate from Subsection 2.2 that the phase 
error is proportional to cr/^/p. In the plot on the right, which corresponds to the improved 



reconstruction using (3.17), we can see that for large values of a and small values of p the 



same relationship holds. However, for smaller a and larger p we see an abrupt transition to 
exact reconstruction (the white area in the upper- left). The boundary of this region (red 
dotted line) follows the relationship log2(p) = olog2(cr) + 16, corresponding to C = 1 in 



(3.19) above. This illustrates that for small enough values of the ratio a/ep^'"^ the rounding 



procedure is exact. 



3.1. Algorithm. Our first algorithm for noisy signals is only a slight modification of the 



noiseless algorithm presented in |LWC13( Algorithm 1]. Considering (3.19), we change the 
lower bound 



(3.20) 



p > cik 
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to 

(3.21) p > max{ciA;, C2{a/ef/^}. 

In this way we ensure that the choice of p is always large enough to isolate most of the k 
frequencies on average as well as being large enough to ensure that the rounding procedure 



(3.17) is exact. In all of our experiments below we took C2 = 4. 



3.2. Empirical evaluation. In this section we describe the results of an empirical evalu- 



ation of the algorithm with the rounding procedure (3.17) and new choice of sampling rate 



p given by (3.21). We focus on two aspects of the algorithm's performance: accuracy as 



measured in the EMD(l) metric (c.f. Subsection 2.3), and runtime as a function of both the 
sparsity k and the noise level a. In all of the experiments reported below, we report aver- 
ages over 100 random test signals generated according to the prescription in Subsection |2.4[ 
The bandwidth for these test was fixed at A^ = 2^^, and the shift e was taken to be 1/2N. 
All experiments were conducted in C++ on a Linux machine with four Intel Xeon X5355 
dual-core processors at 2.66 GHz and 64 Gb of RAM. The Intel compiler was used with 
optimization flag -03. All FFTs are performed using FFTW3 |FJ05| . and timing results are 
reported in CPU ticks using the "cycle. h" header included with the FFTW3 source code. 

3.2.1. Accuracy. In Figure ^^ we plot the average EMD(l) error of the algorithm with the 
rounding modification as a function of the noise level a. It is clear that the error increases 
linearly with the noise level, which indicates that the method is robust with respect to 
additive noise in the signal. Moreover, the 'EiMD{uj) error was zero for all trials of the 
rounding algorithm, confirming that taking p > C2(o"/e)^''^ suffices to exactly recover the 
true frequencies. 

3.2.2. Runtime. In Figurep](A) we plot the average runtime (in CPU ticks) as a function of 
the sparsity k for a fixed value of the noise level a = 0.512 and the parameter ci = 2. Note 
that the Fourier coefficients in our test signals all have unit magnitude, so this is a severely 
disadvantageous situation. For this high level of noise, we see that there is no dependence 



on k until k = 64; this is a consequence of the requirement (3.21 ) on the choice of sampling 
rate. As a reference for runtime comparisons, FFTW3 takes approximately 10^ CPU ticks 
on the same machine. Thus at this noise level our modified algorithm is slightly slower than 
a highly optimized FFT implementation. 
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10 
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Figure 2. Average EMD(l) error of the algorithm with rounding modifi- 
cation as a function of noise level a. The EMD(a;) error was zero for all 
runs. 

In Figure p^ (B) we plot the average runtime (in CPU ticks) as a function of the noise 
level a for a fixed value of the sparsity k = 64. From the figure we can see the approximate 



dependence of the runtime on o"^'^, as dictated by the choice of p in (3.21). Moreover, 



for a < 0.032, the algorithm is faster than FFTW. In the next section we develop a more 
sophisticated algorithm which outperforms FFTW for a wider range of noise levels. 



4. A MuLTiscALE Algorithm 

In Section h^ we saw that taking p > uiax{cik, C2{cr /e)^'^} sufficed to ensure that the 
rounding procedure was exact. While this gives good results in terms of accuracy, the in- 
creased runtime associated with larger noise levels is undesirable. The main contribution 
of this paper is a multiscale algorithm for recovering the frequency set il of the signal S{t). 
This algorithm achieves equal if not superior accuracy while providing improvement by 
several orders of magnitude in computational efficiency. The key feature of this multiscale 
algorithm is the employment of multiple shifts Ej , which enable us to improve the accuracy 
of the phase estimations progressively without the need to significantly increase the sam- 
ple length p. In Subsection |4.1| we give some background on our multi-scale method and 
introduce the main idea of our algorithm. In Subsection |4.2| we prove that our multiscale 
approximations are accurate estimates of the true frequencies. In Subsection|4.3|we describe 



the basic multi-scale algorithm, and in Subsection 4.4 we report the results of an empirical 
evaluation. 
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rounding only, sigma = 0.512, c = 2 
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noise level a 
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Figure 3. (A) Average runtime (in CPU ticks) vs. sparsity k for the al- 
gorithm with rounding modification. (B) Average runtime vs. noise level 
a. 



4.1. Multiscale frequency estimation. The main idea for the multiscale algorithm is 
that a value can be estimated with high precision with an inaccurate (coarse) estimator 
applied progressively at different scales, much like in analog-to-digital conversion where a 
signal value can be estimated with very high precision by the very coarse binary quantiza- 
tion. In our sparse Fourier recovery algorithm, the coarse estimator is the approximation 



formula given by (2.14) 
(4.1) 



1 , ^le[h] 

euj =1 — Arg -^ 
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where =z is measured by the Lee norm || • ||g. 

For simphcity let us assume for the moment that our signal contains a single frequency 
a; with non-zero Fourier coefficient. For a fixed p, let uj be our estimate for uj using the 
rounding procedure from Section p] with shift Sq < 1/N. Then we have 

(4.2) UJ = to (mod p), 

although in general cj may differ from w by a multiple of p. 

Suppose now that we repeat the computation of uj using a larger shift ei > £0] that is, 
we sample our signal at time points j/p + ei, take the FFT, and compute 

(4.3) ,, = ^aJ%M] 

(note that we do not divide by £1). Since in general £1 > 1/A^, we cannot take 61 /ei as an 
estimate for uj, although it still holds that 

(4.4) 6i«eia;(mod[-i,i)), 

where x (mod [— ^, ^)) is the unique value y in [— |, ^) such that x = y (mod 1). We can 
use this fact to estimate the error u — uj as follows. Note that 

(4.5) £l{uj — UJ) = £lUJ — £lUJ 

^ {bi-£iSj) (mod [-5,5)), 
so that 



(4.6) UJ — UJ ^ {bi — £iuo) (mod [— ^, ^))/ei. 

This estimate of the error is not exact, since there is still noise that can perturb the 



calculated value 61 from the true value £iuj (mod [—2, 2))- However, analogous to (2.15) 
we have 

(^ _ S) _ (6, _ e,u) (mod [-^ i))/ei = O (^) , 

which immediately implies that the updated estimate satisfies 

(4.7) ^ _ (S + (61 - eiu) (mod [-^ i))/ei) = O (^) . 

Since £1 > £q, adding the correction term (4.6) to our previous estimate uj will give a finer 
approximation to the true frequency uj. By iterating this error correction process with 
progressively larger shifts £j, we obtain an algorithm which adaptively corrects for the error 
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in a multiscale fashion. In the next section we provide a detailed analysis of this niultiscale 
approximation scheme, and prove that the frequency estimates it produces are accurate. 



4.2. Analysis of multiscale approximations. We begin with a technical lemma relating 

1 1- 

'2' 2' 



arithmetic in the Lee norm || • \\z to that on the interval [— k, ^). It will be used repeatedly 



in the sequel. 

Lemma 4.1. Let 6 > and x G [— 2 ~'~ '^' 2 ~ "^1 ■ Assume that \\x — b\\z < 6 and 6 G [—2,2)- 
Then \x — b\ < 5. 

Proof. Let r = \\x — b\\i- Then x — b = itr + k for some k £ Z. li k = we are done. 
Assume k ^ 0. Note that |a; — 6| < |x| + \b\ < 1 — 6. However, |ibr + A;|>l — r>l — 5. 
This is a contradiction. I 

The following theorem formalizes the multiscale frequency estimation procedure which 
was introduced in the previous subsection. 

Theorem 4.2. Let ui G [—^, y)- Let < Sq < ei < ■ ■ ■ < Em and bQ,bi, . . . ,bm G 1^ such 
that 

\\£jUJ — bj\\i < 6, < j < m 

where < 6 < ^. Assume that eq < ~w~ ^'^^ f^J '■~ ^j/^j-'^ ^ (1 — 26)/ (26). Then there 
exist Co, ci, . . . , Cm G M, each computable from {ej} and {bj}, such that 

j- m m 

\io — uj] < — II /^i" > where uj := N^ — . 

Proof. Denote a;o := co. We first note that |eo'^o| ^ ^oy ^ ^~^- Let cq = bo (mod [— |, ^)), 
so that l^o'^o — Col < 6 by Lemma |4.1[ Let Aq = cq/eq, which represents a coarse estimate 
of loq with the error bound |Ao — wqI < 6/eo. 

Let oji = loq — Xq. By the above |wi| < 6/eo and 

lEiuil < — = Pi6 < - - 6. 
£0 2 

Now \\eiuj — bi\\x = \\eiuji — (61 — eiAo)||z < S- Set ci = bi — EiXq (mod [— i, 4)). It follows 



from Lemma 4.1 again that \eicoi — ci| < 6. We set Ai = ci/ei. 
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We can recursively define Cj,\j and ujj for all 1 < j < ?n,. In general we define Uj := 



Wo-i — A,_i. This leads to 



\ejujj\ < 



ej6_ 



M < 



Set Cj = {bj — £jXj-i) (mod [— ^,^)). Then \\ejUjj — Cj\\z < S. Lemma 4.1 now yields 
\£jU}j — Cj\ < 5. Set Xj = Cj/sj. 

Finally denote uJm+i = ^m — Xm- It is straightforward now to verify that 



UJ = U)Q 



E^.+ 



'^m+l 



j=0 



E 

i=o 



^m+l- 



Furthermore, by construction Wm+i = ^m — Am, which has |ix)m+i| < S/sm- By hypothesis 
Em = £oWj=il^j- Thus 



|Wm+l| < — TT/3,- ^• 



The theorem is now proved. 



Remark 4.1. From the proof of Theorem 4.2 the values Cj and uj are explicitly computable 



through the recursive formula uq = uj, cq = bo (mod [—2, g)), ^^0 = co/eo and 



a;. 



(4.8) 

for 1 < j < m. 



LOj^l - Aj_i 

{bj - EjXj-i) (mod [-5,5)) 

Cj/Ej 



Corollary 4.3. Assume that in the above theorem we have f3j = (3 where /3 < (1 — 25)/ (25), 



i.e. £j = f3^£o for all j. Let p > and m > 



log, 



25 



' ' " £0 2 



J 
+ 1. Then 

P 



Proof. This is a straightforward corollary. By Theorem 4.2 we have 



a; — w < 



eo 



n^7" 



i=i 



eo 



-/3" 



It is easy to check that m 



log 



2(5 
/3 peo 



+ 1 is the smallest integer such that — /3 "^ < §. 



Note that as we have mentioned in Section 3, even with noise the value uj (mod p) can 
be precisely computed very reliably. Thus if the difference |a; — 51 is smaller than | then uj 
can be recovered exactly by taking the closest integer to uj with the same residue modulo p. 
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In numerical tests we usually choose uniform 13 j = /3. While making f3 as large as it can 
be for a given error estimate 6 will undoubtedly reduce the computational cost, there is 
nevertheless a good reason that we should not be too "greedy" and be more conservative 
by choosing a smaller /? > 1. The reason is that given the random nature of the noise the 
error bound 6 is only in the average sense. To minimize reconstruction errors we should 
try to provide as much latitude as possible for the uncertainties associated with the error 
estimate 6. Hence it is useful to ask how much latitude does one get for given choices of eo 
and /3. 

Theorem 4.4. Let oj G [— y ; y); eo > and (3 > 1. Set Ej = /3-^eo for 1 < j < m. Assume 
that we have 6o, 6i, . . . , ^m G I^ such that 



EjOJ — bj ||z < <5, I < j < in 

1 - EqN 1 



where 

(4.9) 5 = min , , ^ 

^ ^ V 2 '2/? + 2 

Then the estimate ui of uj given by oj := YlTLo '^ satisfies 

|w-w| < — /3-™, 
£0 



where Cj are given in {4-S). 



Proof. The proof is straightforward. Note that Theorem |4.2| holds under the conditions 
^ and f3j < i^ 



eo ^ ~A/^ and (3j < „^ . These two conditions are equivalent to the condition 6 < 



min ( — 2^—, 2B+2 ) • Clearly, 6 = min ( — |S— , 2B+2 ) ^^ *^^ largest admissible value for 6. I 
4.3. Algorithm. In this section we provide some details of our implementation of the mul- 



tiscale frequency estimation procedure described in Subsection [4TJ In particular, we discuss 
the choice of various parameters necessary for reconstruction according to Theorem 4.2 as 
well as changes made to the aliasing detection test from |LWC13j to improve robustness in 
the presence of noise. 



4.3.1. Number of iterations. Recall from Corollary 4.3 that, for constant (3j = (3, m 



log^ -^ +1 shifts suffices to ensure that the estimated frequency satisfies |S; — a;| < |. As 
in Section^ we take eo = 1/2A^ to avoid the branch of Arg(z); furthermore, as in |LWC13J 
the sample lengths are p are chosen proportional to the sparsity k. Thus after 0(log(A^/A;)) 
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iterations, by rounding the approximate frequency uj to the closest integer of the form np-\-h, 
where h = uj (mod p) is known from the location of the peak in 8°, we will recover the true 
frequency to. With the results of |LWC13j this immediately implies that the average-case 
runtime of the multiscale algorithm is 0(klog{k) log{N/k)). 



4.3.2. Choice of (3. It remains to determine the choice of /3, given the sample length p and 



the noise level a. Recall from the proof of Theorem 4.2 that the estimated frequency uJ is 
given by the sum "^JLi -^ji where Xj = Cj/ej. Moreover, the difference between successive 
frequency approximations is given in terms of Xj as 



UJi 



A,_i 



A, 



UJj^l - Aj_i =^ Aj - UJj - Wj + i. 
Thus we can decompose the error of approximation at stage j + 1 as 

\UJ - CJj + ll = \{u)j - UJj + i) - {ujj - Uj)\ 

(4.10) = \Xj - {ojj - uj)\. 



By Theorem 4.2 the left-hand side of (4.10) satisfies 

<5 



^7 + 1 1 < 



Sj+l 



while analogously to (2.15) the right-hand side of (4.10) satisfies 



I'' -<"'-">! so (24^)- 



Denoting by c^ the constant in the right-hand side above and equating the two upper bounds 
gives 

2-k6^ _ Ej+i 



(4.11) 



Ca<J 



^3 



=:/3. 



As mentioned after Corollary |4.3[ in our implementation we take j3 to be somewhat smaller 



than (4.11) would suggest, using the value (3 



2Co-(T 



. Note that for large a and small p this 



quantity could be less than one. In order to ensure /3 > 1 we require the sample lengths 
to satisfy p > max{ci/c, (2co-cr)^}; due to this the runtime bound in the previous subsection 
holds only for a < -j^ 



/cik 



4.3.3. Robust aliasing test. As noted in Subsection 2.1, our frequency estimation procedure 
works only for non-collsion uj. In [LWC13J two tests were given to determine whether a 
collision had occurred at a candidate frequency. In the implementation of that algorithm in 
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the noiseless setting, requiring the ratio (2.6) to be within some threshold of unity sufficed 



to detect collisions. In the setting of the current paper, where the samples are corrupted 
with noise, we resort to the second of the tests given in in [LWC13J . which examines the 



ratios (2.6) for several values of e. For < j < ?ti we compute the ratio (2.6) and compare 
it with a threshold r. We count the number of times the ratio exceeds r and reject those 
frequencies which fail more than an ij fraction of the tests. Since we expect fluctuations 
in this ratio due to noise of order o j ^ we set r to be a small constant multiple of this 
quantity. 

In Algorithm [T] we give pseudocode for the iterative frequency estimation procedure; the 
full algorithm is given by replacing the calculation of frequencies in |LWC13( Algorithm 1] 
with this procedure. 

4.4. Empirical evaluation. In this section we present the results of an empirical evalua- 



tion of our multiscale algorithm. As in Subsection 3.2, we focus on accuracy and runtime, 
and report average quantities over 100 random test signals. The test signals were generated 
in the same manner as in Subsection 13.21 and were evaluated on the same machine. After 
extensive testing it was determined that the choice of parameters ci = 2, C(j = 6, f] = 4 
gave a satisfactory balance between runtime and accuracy. 

4.4.1. Accuracy. In Figure H we plot the average EMD(l) error as a function of the noise 
level a. The EMD(a;) error was zero for all but one of the trials. From the figure it is clear 
that the multiscale algorithm performs well in the presence of noise, even with ci as small as 
2. This is in contrast to the rounding algorithm of Section^ which required p > (c/e)^' ^ to 
achieve a similar accuracy. The multiscale error correction allows us to take much coarser 
sampling rates to achieve a given error. As we show in the next subsection, these coarser 
sampling rates lead to much improved runtime. 

4.4.2. Runtime. In Figure Is] (A) we plot the average runtime (in CPU ticks) as a function 
of the sparsity k. From the figure we can see that the runtime scales slightly super linearly 



with k, which is expected given the runtime bound 0(A; log(A;) log(A^/A;)) of Subsection 4.3.1 
Moreover, we note that for all levels of sparsity tested, the multiscale algorithm outperforms 
FFTW (which, as noted in Subsection 3.2, takes approximately 10^ CPU ticks on the same 
length signal). 
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Algorithm 1 MultiscaleFreqEst 



Input: S{t), N,k,p,a,Ca,'i] 
Output: {oJe}'l^i 



10: 



15: 



20: 



Vp 



P 



Vp 

2c^cr' 



m 



1 + 



log.f 



vote^ <— 0, 



FFT of ^-samples of S{t) 



for J = to m, do 

Ej ^ p^/2N 

Sp,ej <— FFT of ej-shifted ^-samples of S{t) 

for £ = 1 to /c do 

h ^ index of i^^ largest peak in Sp 



if r > T then 

vote^ -^ vote^ + 1 
end if 

^Arg 



if J = then 
else 



W£ 



i^^? + {bj - SjUJe) (mod [-2, 2))/^] 



P 



end if 

a j = m then 

uj£ -^ p ■ round 
end if 
end for 
end for 

return uji with vote^ < r]{m + 1 



+ h 



In Figure p^ (B) we plot the average runtime as a function of the noise level a. Only a 
very mild dependence of the runtime on a is visible for a < 0.128. The slight increase for 
larger a is due to the requirement (3 > 1, as noted in Subsection 4.3.2[ Note the contrast 
with Figure^ (B) for the rounding algorithm, where the dependence was of the form a'^'^ 
as required by the choice of sampling rate p. 



5. Conclusion 



In this paper we gave two extensions of the sparse Fourier algorithm of |LWC13J to handle 
noisy signals. The first of these was a minor modification of the original algorithm that 
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multiscale, N=2^^, k=64, c =2, c =6, ri=0.25 

1 a 




10 10 
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Figure 4. Average EMD(l) error vs. noise level a for the algorithm with 
multiscale frequency identification. 



involved rounding frequency estimates to the nearest integer with the correct residue modulo 
the sampling rate. We showed that in order for this modification to correctly identify the 
true frequencies in Gaussian noise of standard deviation a the sampling rate needed to 
satisfy p > u^'^. While this resulted in accurate approximations of the Fourier transform 
in the EMD(l) metric, the sampling rate requirement forced the algorithms to be slow in 
practice. 

The second extension overcame this pitfall by introducing a novel multiscale approach 
to frequency estimation in the sparse Fourier transform context. By using samples of the 
input at multiple time shifts spaced geometrically, our algorithm exhibits a form of error 
correction in its frequency estimation. This allows the use of much coarser sampling rates 
than the first modification, which in turn leads to greatly reduced runtimes in our empirical 
evaluation. The error correction of our multiscale algorithm is to the best of our knowledge 
novel in the sparse Fourier transform context, and we believe it is a promising approach for 
further investigation. 

Acknowledgments 



During the preparation of this manuscript we became aware of similar work by Laurent 
Demanet and his collaborators. We kindly acknowledge their generosity in sharing their 
work with us. 



A MULTISCALE SUB-LINEAR TIME FOURIER ALGORITHM FOR NOISY DATA 



23 



w 10' 






CD 



^ 10^ 



cc 



10 



-.22 



multiscale, N=2^'', a=0.512, c^=2, c^=6, ri=0.25 



10' 
sparsity k 

(A) 



10^ 



-.22 



multiscale, N=2 , k=64, c =2, c =6, ri=0.25 

1 a 




noise level a 

(B) 

Figure 5. (A) Average runtime (in CPU ticks) vs. sparsity k for the al- 
gorithm with multiscale frequency identification. (B) Average runtime vs. 
noise level a. 
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